15 research outputs found
Universal Approximation with Deep Narrow Networks
The classical Universal Approximation Theorem holds for neural networks of
arbitrary width and bounded depth. Here we consider the natural `dual' scenario
for networks of bounded width and arbitrary depth. Precisely, let be the
number of inputs neurons, be the number of output neurons, and let
be any nonaffine continuous function, with a continuous nonzero derivative at
some point. Then we show that the class of neural networks of arbitrary depth,
width , and activation function , is dense in for with compact. This covers
every activation function possible to use in practice, and also includes
polynomial activation functions, which is unlike the classical version of the
theorem, and provides a qualitative difference between deep narrow networks and
shallow wide networks. We then consider several extensions of this result. In
particular we consider nowhere differentiable activation functions, density in
noncompact domains with respect to the -norm, and how the width may be
reduced to just for `most' activation functions.Comment: Accepted at COLT 202
Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU
Signatory is a library for calculating and performing functionality related
to the signature and logsignature transforms. The focus is on machine learning,
and as such includes features such as CPU parallelism, GPU support, and
backpropagation. To our knowledge it is the first GPU-capable library for these
operations. Signatory implements new features not available in previous
libraries, such as efficient precomputation strategies. Furthermore, several
novel algorithmic improvements are introduced, producing substantial real-world
speedups even on the CPU without parallelism. The library operates as a Python
wrapper around C++, and is compatible with the PyTorch ecosystem. It may be
installed directly via \texttt{pip}. Source code, documentation, examples,
benchmarks and tests may be found at
\texttt{\url{https://github.com/patrick-kidger/signatory}}. The license is
Apache-2.0.Comment: Published at ICLR 202
Generalised Interpretable Shapelets for Irregular Time Series
The shapelet transform is a form of feature extraction for time series, in
which a time series is described by its similarity to each of a collection of
`shapelets'. However it has previously suffered from a number of limitations,
such as being limited to regularly-spaced fully-observed time series, and
having to choose between efficient training and interpretability. Here, we
extend the method to continuous time, and in doing so handle the general case
of irregularly-sampled partially-observed multivariate time series.
Furthermore, we show that a simple regularisation penalty may be used to train
efficiently without sacrificing interpretability. The continuous-time
formulation additionally allows for learning the length of each shapelet
(previously a discrete object) in a differentiable manner. Finally, we
demonstrate that the measure of similarity between time series may be
generalised to a learnt pseudometric. We validate our method by demonstrating
its performance and interpretability on several datasets; for example we
discover (purely from data) that the digits 5 and 6 may be distinguished by the
chirality of their bottom loop, and that a kind of spectral gap exists in
spoken audio classification
"Hey, that's not an ODE": Faster ODE Adjoints with 12 Lines of Code
Neural differential equations may be trained by backpropagating gradients via
the adjoint method, which is another differential equation typically solved
using an adaptive-step-size numerical differential equation solver. A proposed
step is accepted if its error, \emph{relative to some norm}, is sufficiently
small; else it is rejected, the step is shrunk, and the process is repeated.
Here, we demonstrate that the particular structure of the adjoint equations
makes the usual choices of norm (such as ) unnecessarily stringent. By
replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily
rejected and the backpropagation is made faster. This requires only minor code
modifications. Experiments on a wide range of tasks---including time series,
generative modeling, and physical control---demonstrate a median improvement of
40% fewer function evaluations. On some problems we see as much as 62% fewer
function evaluations, so that the overall training time is roughly halved
Neural Controlled Differential Equations for Online Prediction Tasks
Neural controlled differential equations (Neural CDEs) are a continuous-time
extension of recurrent neural networks (RNNs), achieving state-of-the-art
(SOTA) performance at modelling functions of irregular time series. In order to
interpret discrete data in continuous time, current implementations rely on
non-causal interpolations of the data. This is fine when the whole time series
is observed in advance, but means that Neural CDEs are not suitable for use in
\textit{online prediction tasks}, where predictions need to be made in
real-time: a major use case for recurrent networks. Here, we show how this
limitation may be rectified. First, we identify several theoretical conditions
that interpolation schemes for Neural CDEs should satisfy, such as boundedness
and uniqueness. Second, we use these to motivate the introduction of new
schemes that address these conditions, offering in particular measurability
(for online prediction), and smoothness (for speed). Third, we empirically
benchmark our online Neural CDE model on three continuous monitoring tasks from
the MIMIC-IV medical database: we demonstrate improved performance on all tasks
against ODE benchmarks, and on two of the three tasks against SOTA non-ODE
benchmarks
Neural Rough Differential Equations for Long Time Series
Neural controlled differential equations (CDEs) are the continuous-time
analogue of recurrent neural networks, as Neural ODEs are to residual networks,
and offer a memory-efficient continuous-time way to model functions of
potentially irregular time series. Existing methods for computing the forward
pass of a Neural CDE involve embedding the incoming time series into path
space, often via interpolation, and using evaluations of this path to drive the
hidden state. Here, we use rough path theory to extend this formulation.
Instead of directly embedding into path space, we instead represent the input
signal over small time intervals through its \textit{log-signature}, which are
statistics describing how the signal drives a CDE. This is the approach for
solving \textit{rough differential equations} (RDEs), and correspondingly we
describe our main contribution as the introduction of Neural RDEs. This
extension has a purpose: by generalising the Neural CDE approach to a broader
class of driving signals, we demonstrate particular advantages for tackling
long time series. In this regime, we demonstrate efficacy on problems of length
up to 17k observations and observe significant training speed-ups, improvements
in model performance, and reduced memory requirements compared to existing
approaches.Comment: Published at ICML 202
Combined BIMA and OVRO observations of comet C/1999 S4 (LINEAR)
We present results from an observing campaign of the molecular content of the
coma of comet C/1999 S4 (LINEAR) carried out jointly with the millimeter-arrays
of the Berkeley-Illinois-Maryland Association (BIMA) and the Owens Valley Radio
Observatory (OVRO). Using the BIMA array in autocorrelation (`single-dish')
mode, we detected weak HCN J=1-0 emission from comet C/1999 S4 (LINEAR) at 14
+- 4 mK km/s averaged over the 143" beam. The three days over which emission
was detected, 2000 July 21.9-24.2, immediately precede the reported full
breakup of the nucleus of this comet. During this same period, we find an upper
limit for HCN 1-0 of 144 mJy/beam km/s (203 mK km/s) in the 9"x12" synthesized
beam of combined observations of BIMA and OVRO in cross-correlation (`imaging')
mode. Together with reported values of HCN 1-0 emission in the 28" IRAM
30-meter beam, our data probe the spatial distribution of the HCN emission from
radii of 1300 to 19,000 km. Using literature results of HCN excitation in
cometary comae, we find that the relative line fluxes in the 12"x9", 28" and
143" beams are consistent with expectations for a nuclear source of HCN and
expansion of the volatile gases and evaporating icy grains following a Haser
model.Comment: 18 pages, 3 figures. Uses aastex. AJ in pres
The Comet Interceptor Mission
Here we describe the novel, multi-point Comet Interceptor mission. It is dedicated to the exploration of a little-processed long-period comet, possibly entering the inner Solar System for the first time, or to encounter an interstellar object originating at another star. The objectives of the mission are to address the following questions: What are the surface composition, shape, morphology, and structure of the target object? What is the composition of the gas and dust in the coma, its connection to the nucleus, and the nature of its interaction with the solar wind? The mission was proposed to the European Space Agency in 2018, and formally adopted by the agency in June 2022, for launch in 2029 together with the Ariel mission. Comet Interceptor will take advantage of the opportunity presented by ESAâs F-Class call for fast, flexible, low-cost missions to which it was proposed. The call required a launch to a halo orbit around the Sun-Earth L2 point. The mission can take advantage of this placement to wait for the discovery of a suitable comet reachable with its minimum ÎV capability of 600 msâ1. Comet Interceptor will be unique in encountering and studying, at a nominal closest approach distance of 1000 km, a comet that represents a near-pristine sample of material from the formation of the Solar System. It will also add a capability that no previous cometary mission has had, which is to deploy two sub-probes â B1, provided by the Japanese space agency, JAXA, and B2 â that will follow different trajectories through the coma. While the main probe passes at a nominal 1000 km distance, probes B1 and B2 will follow different chords through the coma at distances of 850 km and 400 km, respectively. The result will be unique, simultaneous, spatially resolved information of the 3-dimensional properties of the target comet and its interaction with the space environment. We present the missionâs science background leading to these objectives, as well as an overview of the scientific instruments, mission design, and schedule
On neural differential equations
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations.
NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides.
This doctoral thesis provides an in-depth survey of the field.
Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions).
Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation).
We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art